Scalable multithreaded algorithms for mutable irregular data with application to anisotropic mesh adaptivity

نویسنده

  • Georgios Rokos
چکیده

Anisotropic mesh adaptation is a powerful way to directly minimise the computational cost of mesh based simulation. It is particularly important for multi-scale problems where the required number of floating-point operations can be reduced by orders of magnitude relative to more traditional static mesh approaches. Increasingly, finite element/volume codes are being optimised for modern multicore architectures. Inter-node parallelism for mesh adaptivity has been successfully implemented by a number of groups using domain decomposition methods. However, thread-level parallelism using programming models such as OpenMP is significantly more challenging because the underlying data structures are extensively modified during mesh adaptation and a greater degree of parallelism must be realised while keeping the code race-free. In this thesis we describe a new thread-parallel implementation of four anisotropic mesh adaptation algorithms, namely edge coarsening, element refinement, edge swapping and vertex smoothing. For each of the mesh optimisation phases we describe how safe parallel execution is guaranteed by processing workitems in batches of independent sets and using a deferredoperations strategy to update the mesh data structures in parallel without data contention. Scalable execution is further assisted by creating worklists using atomic operations, which provides a synchronisation-free alternative to reduction-based worklist algorithms. Additionally, we compare graph colouring methods for the creation of independent sets and present an improved version which can run up to 50% faster than existing techniques. Finally, we describe some early work on an interrupt-driven work-sharing for-loop scheduler which is shown to perform better than existing workstealing schedulers. Combining all aforementioned novel techniques, which are generally applicable to other irregular problems, we show that despite the complex nature of mesh adaptation and inherent load imbalances, we achive a parallel effi-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experience with Memory Allocators for Parallel Mesh Generation on Multicore Architectures

Scalable and locality-aware multiprocessor memory allocators are critical for harnessing the potential of emerging multithreaded and multicore architectures. This paper evaluates two state-of-the-art generic multithreaded allocators designed for both scalability and locality, against custom allocators, written to optimize the multithreaded implementation of parallel mesh generation algorithms. ...

متن کامل

Thread Parallelism for Highly Irregular Computation in Anisotropic Mesh Adaptation

Thread-level parallelism in irregular applications with mutable data dependencies presents challenges because the underlying data is extensively modified during execution of the algorithm and a high degree of parallelism must be realized while keeping the code race-free. In this article we describe a methodology for exploiting thread parallelism for a class of graph-mutating worklist algorithms...

متن کامل

Zurich ̈ Technische

A general 2D-hp-adaptive Finite Element FE implementation in Fortran 90 is described. The implementation is based on an abstract data structure, which allows to incorporate the full hp-adaptivity of triangular and quadrilateral nite elements. The h-reenement strategies are based on h2-reenement of quadrilaterals and h4-reenement of triangles. For p-reenement we allow the approximation order to ...

متن کامل

SCALABLE, FINITE ELEMENT ANALYSIS OF ELECTROMAGNETIC SCAllERING AND RADIATION Error Estimation And h-Adaptivity

In this paper a method for simulating from complex objects is reviewed; namely, an electromagnetic fields scattered unstructured finite element code that does not use traditional mesh partitioning algorithms. The complete software package is implemented on the Cray T3D massively parallel processor using both Cray Adaptive FURTRAN (CRAFT) compiler constructs to simplify portions of the code that...

متن کامل

Deadlock free routing algorithms for irregular mesh topology NoC systems with rectangular regions

The simplicity of regular mesh topology Network on Chip (NoC) architecture leads to reductions in design time and manufacturing cost. A weakness of the regular shaped architecture is its inability to efficiently support cores of different sizes. A proposed way in literature to deal with this is to utilize the region concept, which helps to accommodate cores larger than the tile size in mesh top...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014